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1 METHOD AND APPARATUS FOR SWITCHING BETWEEN MULTIPLE 

2 IMPLEMENTATIONS OF A ROUTINE 

3 Gary A. Coutant 

4 Carol L. Thompson 

5 

6 FIELD OF THE INVENTION 

7 The present invention generally relates to management of binary program code 

8 for different implementations of a processor architecture, and more particularly to 

9 switching between multiple implementations of a routine that are associated with the 
1 0 implementations of a processor architecture. 

ll 

12 BACKGROUND 

13 It is common to build multiple implementations of a basic processor 

14 architecture for the purpose of providing various performance and pricing options. For 

15 example, a processor architecture that supports a particular instruction set may have a 

16 first-level cache in one implementation, and another implementation may have first and 

1 7 second level caches. 

18 Some system-provided routines and other library routines are written and 

19 compiled to exploit the performance characteristics of a particular implementation of a 
2 o processor. For example, a memory-to-memory copy routine may be written and 

2 1 compiled differently from one implementation to another. While the binary code will 

22 execute correctly on any implementation, there may be a negative impact on 

23 performance when the binary code is executed on an implementation other than the 
2 4 target implementation. 

2 5 A software developer can either develop separate binary libraries for the 

2 6 different implementations, develop a single binary for all implementations, or develop 

2 7 several versions of selected routines along with a run-time switch for selecting between 

28 the different versions. Developing separate binary libraries is technically 

2 9 straightforward. However, there is a cost associated with managing and distributing 

3 0 separate binary libraries. If only a single binary library is provided, users may not 

3 1 receive the full performance benefit of a particular implementation. While run-time 

3 2 switches would appear to provide a reasonable tradeoff between the management of 

3 3 multiple binary libraries and performance degradation, the run-time switch introduces 

3 4 overhead when a library call is made to determine the implementation on which the 
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1 code is executing. 

2 A method and apparatus that address the aforementioned problems, as well as 

3 other related problems, are therefore desirable. 

4 

5 SUMMARY OF THE INV ENTION 

6 In various embodiments, a method and apparatus are provided for switching 

7 between multiple implementations of a routine. In one embodiment, different versions 

8 of a library routine are programmed to exploit different features of different computer 

9 systems. The different versions are available in a single library, and an application 

10 program need not differentiate between the different implementations in using the 

11 routine. Using hardware characteristics that are associated with the different versions 

12 and hardware characteristics of the computer system on which the application is to be 

13 executed, references to the routine are resolved when the application and library are 

14 loaded. Thus, execution of the application is not burdened with runtime resolution of 

1 5 references to the routine. 

16 It will be appreciated that various other embodiments are set forth in the 

1 7 Detailed Description and Claims which follow. 

18 

19 BRIEF DESCRIPTION OF THE DRAWINGS 

2 0 Various aspects and advantages of the invention will become apparent upon 

2 1 review of the following detailed description and upon reference to the drawings in 

22 which: 

23 FIG. 1 is a flowchart of a process for generating multiple binary 

2 4 implementations of a routine for different implementations of a particular processor 

25 architecture; 

2 6 FIG. 2 is a block diagram illustrating a symbol table in relationship to a shared 

2 7 library of object code modules; and 

2 8 FIG. 3 is a flowchart of a process for loading a library routine in accordance 

2 9 with one embodiment of the invention, 

30 

31 DETAILED DESCRIPTION 

3 2 In various embodiments, the invention provides a technique for switching 

3 3 between multiple implementations of a library routine that are available in a library of 

34 routines. Each implementation of a routine has an associated set of hardware 

2 
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1 characteristics that indicate the hardware on which the implementation is intended to 

2 execute. The hardware characteristics may include, for example, the processor clock 

3 speed or a model number, cache configuration, latency of selected hardware operations 

4 (load and store, for example), and the availability of certain extensions to the 

5 instruction set. When a routine having multiple implementations is loaded, the 

6 reference is resolved to the appropriate implementation using the associated hardware 

7 characteristics and the hardware characteristics of the host system. Additional 

8 hardware characteristics that may be used for different implementations of routines 

9 include, for example, bypass characteristics, branch prediction behavior, pre-fetching 

10 capability, information describing stall conditions, branch penalties, size and 

1 1 associativity of processor data structures (not just cache, but branch prediction and 

12 ALAT-like structures as well), queue sizes for out-of-order or decoupled processors, 

13 and the number of processors in a multi-processor system. 

14 FIG. 1 is a flowchart of a process for generating multiple binary 

15 implementations of a routine for different implementations of a particular processor 

16 architecture, in accordance with one embodiment of the invention. The process 

17 generally entails for each implementation of a routine, generating object code modules 

18 for the different implementations and adding entries in a symbol table for the different 

19 object code modules. An entry in the symbol table for a routine having multiple 
2 0 implementations has an associated set of hardware characteristics. The hardware 

2 1 characteristics are those of the platform on which the associated object code module is 

22 intended to execute. 

23 At step 102, the first (or next, depending on the iteration) implementation of the 

24 routine is obtained for processing, and at step 104 the hardware characteristics 
2 5 associated with the routine are obtained. 

2 6 The symbol table is updated at step 106. The symbol table includes names of 

2 7 routines and references to the associated object code modules. For routines having 

2 8 multiple implementations, hardware characteristics are also associated with the routine 

2 9 name in the symbol table. When an application is loaded and bound to the shared 

3 0 library that contains the multiple binary implementations of a routine, the system 

3 1 dynamic loader selects the appropriate implementation from the symbol table based on 

3 2 the hardware characteristics of the host system. 

3 3 In one embodiment, the hardware characteristics that are to be associated with a 

34 routine are provided by the developer in a configuration file that is read by the linker at 

3 
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1 the time the shared library is being built. In this embodiment, the programmer will 

2 have coded different versions of the routine and used different names for the different 

3 versions. The information in the configuration file associates the names of the different 

4 versions with sets of hardware characteristics and with a generic name. 

5 In another embodiment, the compiler could be adapted to generate multiple 

6 object code modules from a source code module. For example, in response to a 

7 command-line option, the compiler automatically generates hardware-specialized 

8 implementations, generates unique symbol names for the specialized implementations, 

9 and generate the mapping information. The mapping information associates the generic 

1 o routine name with the specialized implementations, and the specialized 

11 implementations with sets of hardware characteristics. The mapping information may 

12 be stored either in the object file or in a separate configuration file that is used by the 

13 linker, for example. 

14 At step 108, the object code module is created for the routine, and the object 

15 code module is added to the shared library at step 110. Each routine in the shared 

1 6 library is assigned a unique name. 

17 At step 112, the entry in the symbol table (step 106) is updated to reference the 

18 associated object code module in the object code library. Decision step 1 14 tests 

19 whether there are additional implementations to process. If so, control is returned to 

2 0 step 1 02. Otherwise, the process for generating the multiple implementations is 
2 1 complete. 

22 

2 3 FIG. 2 is a block diagram illustrating a symbol table in relationship to a shared 

24 library of object code modules. The routines in shared library 152 are object code 

25 modules that are associated with and referenced by the entries in symbol table 154. 
2 6 Routines 1 - (n+1) are illustrated in shared library 152. Routines 1 - n have 

2 7 single implementations, and two implementations are illustrated for example routine (n 

2 8 +1). The first implementation of routine (n + 1) is named routine (n + 1), and the 

2 9 second implementation of routine (n + 1) is named routine (n+1)'. 

3 o Symbol table 154 includes entries for each routine and implementation. 

3 1 Routines 1 - n, having only single implementations, include only the routine name and 

3 2 a reference to the corresponding object code module in shared library 152. Routine (n 

3 3 +1) has two implementations, and the entries associated therewith include respective 

34 sets of hardware characteristics that describe, for example, the processor for which the 

4 
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1 implementations were developed. The entry having hardware characteristics set 1 

2 references routine (n + 1) code in shared library 1 52, and the entry having hardware 

3 characteristics set 2 references routine (n + 1)' code in the shared library. 

4 When an application is loaded and bound to shared library 152, the system 

5 dynamic loader selects the appropriate binary implementation from the symbol table 

6 based on the hardware characteristics of the host processor. Thus, the reference to the 

7 appropriate binary implementation is resolved when the program using the shared 

8 library is loaded. Alternatively, some environments may load the shared library after a 

9 program begins execution, and in this environment the references are resolved when the 

10 shared library is loaded. In either environment, the references are resolved at load time 

1 1 versus runtime. The resolution of the references to routines in the symbol table results 

12 in code references to the addresses of the binary implementations in the shared library 

13 152. 

14 By selecting the appropriate object code routine once when the routine is first 

15 referenced instead of resolving the reference each time the routine is referenced at run- 

16 time, the overhead for the switch occurs in the compiler and loader, thereby eliminating 

1 7 issues with respect to run-time performance and switching to an appropriate 

1 8 implementation of a routine. 

19 

2 0 FIG. 3 is a flowchart of a process for loading a library routine in accordance 

2 1 with one embodiment of the invention. At step 302, the hardware characteristics are 

22 obtained from a system configuration file, for example. In another embodiment, the 

2 3 characteristics may be obtained, for example, from hardware identification registers or 

24 from firmware. 

25 At step 304, the loader obtains the name of the routine to be loaded. For 

2 6 example, an application program may reference a particular shared library routine, and 

2 7 the loader uses the program-specified routine name to locate the proper object code 

2 8 module in the shared library. 

2 9 The routine name and hardware characteristics are used at step 306 to match an 

3 0 entry in symbol table 154. Using the reference in the matching entry, step 208 loads 

3 1 the referenced object that is associated with the hardware characteristics. Forward from 

3 2 the time that a routine is referenced and the proper object code module is identified and 

33 loaded, no further matching of hardware characteristics is required on subsequent 

3 4 references to the routine. 

5 
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1 The present invention is believed to be applicable to a variety of systems that 

2 switch between multiple implementations of a routine based on hardware 

3 characteristics. Other aspects and embodiments of the present invention will be 

4 apparent to those skilled in the art from consideration of the specification and practice 

5 of the invention disclosed herein. It is intended that the specification and illustrated 

6 embodiments be considered as examples only, with a true scope and spirit of the 

7 invention being indicated by the following claims. 

8 
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1 CLAIMS 

2 What is claimed is: 

3 1 . A computer-implemented method for switching between multiple 

4 implementations of a routine in a library of routines that are linked with an application 

5 program that is hosted by a computer system, comprising: 

6 compiling a plurality of implementations of a routine into respective object code 

7 modules, the routine having an associated name and each implementation adapted to a 

8 selected hardware configuration; 

9 associating the object code modules with the name of the routine and respective 

10 sets of hardware characteristics; and 

1 1 resolving when the application program is loaded into memory of the computer 

12 system, a reference to the routine using the sets of hardware characteristics and a 

13 hardware configuration of the system. 

14 

15 2. The method of claim 1, further comprising establishing a symbol table having a 

16 plurality of entries, each entry including a name of a routine and a reference to an 

1 7 obj ect code module in the library. 

18 

19 3. The method of claim 2, further comprising, for the routine having a plurality of 

2 o implementations, adding a plurality of entries to the symbol table and associating 

2 1 respective sets of hardware characteristics with the plurality of entries. 

22 

23 4. The method of claim 3, wherein the hardware characteristics include at least one 

2 4 of clock speed of the processor, processor model, cache configuration of the system, 

25 hardware operation latency times, instruction set characteristics, bypass characteristics, 

2 6 branch prediction behavior, pre-fetching capability, information describing stall 

27 conditions, branch penalties, size and associativity of processor data structures, queue 

2 8 sizes for out-of-order or decoupled processors, and the number of processors in a multi- 

2 9 processor system. 
30 

31 5. The method of claim 4, wherein the resolving step further comprises obtaining 

3 2 the hardware configuration of the system from at least one of a system configuration 
33 data file, one or more system identification registers, and system firmware. 

34 
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1 6. The method of claim 3, wherein the resolving step further comprises obtaining 

2 the hardware configuration of the system from at least one of a system configuration 

3 data file, one or more system identification registers, and system firmware. 

4 

5 7. The method of claim 1 , wherein the hardware characteristics include at least one 

6 of clock speed of the processor, processor model, cache configuration of the system, 

7 hardware operation latency times, and instruction set characteristics. 

8 

9 8 . The method of claim 1 , wherein the resolving step further comprises obtaining 

10 the hardware configuration of the system from at least one of a system configuration 

11 data file, one or more system identification registers, and system firmware. 
12 

13 9. A computer-implemented method for switching between multiple 

14 implementations of a routine in a library of routines that are linked with an application 

15 program hosted by a computer system, comprising: 

16 establishing a set of hardware configuration characteristics that describe the 

17 computer system; 

18 establishing a symbol table, the symbol table having one or more entries that 

19 include a name of a routine, a set of hardware characteristics, and an address 
2 o referencing a routine in the library; 

2 1 obtaining a name of a routine having multiple implementations when the library 

22 is loaded with the application program into memory of the computer system; 

23 matching the name of the routine and the set of hardware configuration 

24 characteristics that describe the computer system to an entry in the symbol table; and 
2 5 generating an address in executable code for references to the routine having 

2 6 multiple implementations when the library is loaded with the application program, the 

2 7 address referencing an implementation in the library as identified in the matching step 

2 8 by the entry in the symbol table. 
29 

3 0 10. The method of claim 9, wherein the hardware configuration characteristics 
3 1 include at least one of clock speed of the processor, processor model, cache 

3 2 configuration of the system, hardware operation latency times, and instruction set 

3 3 characteristics. 

34 
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1 11. The method of claim 10, wherein the resolving step further comprises obtaining 

2 the hardware configuration of the system from at least one of a system configuration 

3 data file, one or more system identification registers, and system firmware. 

4 

5 12. The method of claim 9, wherein the resolving step further comprises obtaining 

6 the hardware configuration of the system from at least one of a system configuration 

7 data file, one or more system identification registers, and system firmware. 

8 

9 13. An apparatus for switching between multiple implementations of a routine in a 

1 o library of routines that are linked with an application program that is hosted by a 

11 computer system, comprising: 

12 means for compiling a plurality of implementations of a routine into respective 

13 object code modules, the routine having an associated name and each implementation 

14 adapted to a selected hardware configuration; 

15 means for associating the object code modules with the name of the routine and 

16 respective sets of hardware characteristics; and 

IV means for resolving when the application program is loaded into memory of the 

18 computer system, a reference to the routine using the sets of hardware characteristics 

1 9 and a hardware configuration of the system. 

20 

21 14. A computer-implemented symbol table for referencing a library of object code 

22 modules that implement a plurality of routines, comprising: 

23 a first set of one or more entries, each entry in the first set including a unique 

2 4 name of a routine and a reference to an object code module in the library; and 

25 a second set of one or more entries, each entry in the second set including a 

2 6 shared name of a routine, a set of hardware characteristics, and a reference to an object 

2 7 code module in the library. 
28 

29 15. The symbol table of claim 14, wherein the hardware characteristics include at 

3 o least one of clock speed of a processor, processor model, cache configuration, hardware 

3 1 operation latency times, instruction set characteristics, bypass characteristics, branch 

32 prediction behavior, pre- fetching capability, information describing stall conditions, 

33 branch penalties, size and associativity of processor data structures, queue sizes for out- 
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1 of-order or decoupled processors, and the number of processors in a multi-processor 

2 system. 

3 

4 16. A computer program product configured for causing a computer to perform the 

5 steps of: 

6 compiling a plurality of implementations of a routine into respective object code 

7 modules, the routine having an associated name and each implementation adapted to a 

8 selected hardware configuration; 

9 associating the object code modules with the name of the routine and respective 

10 sets of hardware characteristics; and 

1 1 resolving when the application program is loaded into memory of the computer 

12 system, a reference to the routine using the sets of hardware characteristics and a 

13 hardware configuration of the system. 

14 
15 
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1 ABSTRACT 

2 Method and apparatus for switching between multiple implementations of a 

3 routine. A plurality of implementations of a routine are compiled into respective object 

4 code modules. In one embodiment, each implementation of the routine is adapted for a 

5 particular hardware configuration. The different object code modules are associated 

6 with respective sets of hardware characteristics and with the name of the routine. 

7 When the application program and library are loaded into memory of the computer 

8 system, a references to the routine are resolved using the sets of hardware 

9 characteristics and the hardware configuration of the system. 
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